What's up LOD Cloud? Observing The State of Linked Open Data Cloud Metadata
نویسندگان
چکیده
Linked Open Data (LOD) has emerged as one of the largest collections of interlinked datasets on the web. In order to benefit from this mine of data, one needs to access descriptive information about each dataset (or metadata). However, the heterogeneous nature of data sources reflects directly on the data quality as these sources often contain inconsistent as well as misinterpreted and incomplete metadata information. Considering the significant variation in size, the languages used and the freshness of the data, one realizes that finding useful datasets without prior knowledge is increasingly complicated. We have developed Roomba, a tool that enables to validate, correct and generate dataset metadata. In this paper, we present the results of running this tool on parts of the LOD cloud accessible via the datahub.io API. The results demonstrate that the general state of the datasets needs more attention as most of them suffers from bad quality metadata and lacking some informative metrics that are needed to facilitate dataset search. We also show that the automatic corrections done by Roomba increase the overall quality of the datasets metadata and we highlight the need for manual efforts to correct some important missing information.
منابع مشابه
Roomba: Automatic Validation, Correction and Generation of Dataset Metadata
Data is being published by both the public and private sectors and covers a diverse set of domains ranging from life sciences to media or government data. An example is the Linked Open Data (LOD) cloud which is potentially a gold mine for organizations and individuals who are trying to leverage external data sources in order to produce more informed business decisions. Considering the significa...
متن کاملLinking FRBR Entities to LOD through Semantic Matching
In this paper, we present an approach to automatically link FRBR works identi ed in metadata to the corresponding entity in Linked Open Data resources. The main contribution is a basis for semantic enrichment and veri cation of works identi ed in existing metadata. Through experiments, we demonstrate that FRBR works can be identied in the LOD cloud, which provides a solid ground for further work.
متن کاملAdoption of the Linked Data Best Practices in Different Topical Domains
The central idea of Linked Data is that data publishers support applications in discovering and integrating data by complying to a set of best practices in the areas of linking, vocabulary usage, and metadata provision. In 2011, the State of the LOD Cloud report analyzed the adoption of these best practices by linked datasets within different topical domains. The report was based on information...
متن کاملThe Open Linguistics Working Group: Developing the Linguistic Linked Open Data Cloud
The Open Linguistics Working Group (OWLG) brings together researchers from various fields of linguistics, natural language processing, and information technology to present and discuss principles, case studies, and best practices for representing, publishing and linking linguistic data collections. A major outcome of our work is the Linguistic Linked Open Data (LLOD) cloud, an LOD (sub-)cloud o...
متن کاملCreating and Publishing Metadata of Linked Data —Providing Shoes for the Cobbler’s Children
The number of open datasets available on the web is increasing rapidly with the rise of the Linked Open Data (LOD) cloud and various governmental efforts for releasing public data. However, the metadata available for the datasets is often minimal, heterogeneous, and distributed, which makes finding a suitable dataset for a given need problematic. To address the problem, we present a distibuted ...
متن کامل